Nonlinear growth models represent an instance of nonlinear regression models, a class of models taking the general form \[ y = \mu(x, \theta) + \epsilon, \] where \(\mu(x, \theta)\) is the mean function which depends on a possibly vector-valued parameter \(\theta\), and a possibly vector-valued predictor \(x\). The stochastic component \(\epsilon\) represents the error with mean zero and constant variance. Usually, a Gaussian distribution is also assumed for the error term.
By defining the mean function \(\mu(x, \theta)\) we may obtain several different models, all characterized by the fact that parameters \(\theta\) enter in a nonlinear way into the equation. Parameters are usually estimated by nonlinear least squares which aims at minimizing the residual sum of squares.
\[ \mu(x) = \theta_1 \exp\{\theta_2 x\} \] where \(\theta_1\) is the value at the origin (i.e. \(\mu(x=0)\)), and \(\theta_2\) represents the (constant) relative ratio of change (i.e. \(\frac{d\mu(x)}{dx }\frac{1}{\mu(x)} = \theta_2\)). Thus, the model describes an increasing (exponential growth if \(\theta_2 > 0\)) or decreasing (exponential decay if \(\theta_2 < 0\)) trend with constant relative rate.
\[ \mu(x) = \frac{\theta_1}{1+\exp\{(\theta_2 - x)/\theta_3\}} \] where \(\theta_1\) is the upper horizontal asymptote, \(\theta_2\) represents the x-value at the inflection point of the symmetric growth curve, and \(\theta_3\) represents a scale parameter (and \(1/\theta_3\) is the growth-rate parameter that controls how quickly the curve approaches the upper asymptote).
\[ \mu(x) = \theta_1 \exp\{-\theta_2 \theta_3^x\} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the value of the function at \(x = 0\) (displacement along the x-axis), and \(\theta_3\) represents a scale parameter.
The difference between the logistic and Gompertz functions is that the latter is not symmetric around the inflection point.
\[ \mu(x) = \theta_1 (1 - \exp\{-\theta_2 x\})^{\theta_3} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the rate of growth, and \(\theta_3\) in part determines the point of inflection on the y-axis.
Dipartimento della Protezione Civile: COVID-19 Italia - Monitoraggio della situazione http://arcg.is/C1unv
Source: https://github.com/pcm-dpc/COVID-19
url = "https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale.csv"
COVID19 <- read.csv(file = url, stringsAsFactors = FALSE)
COVID19$data <- as.Date(COVID19$data)
# DT::datatable(COVID19)Warnings
- 15/04/2020: dati Regione Friuli Venezia Giulia ricalcolati (ricalcolo di isolamento domiciliare e dimessi/guariti)
- 12/04/2020: dati P.A. Bolzano ricalcolati (ricalcolo dati guariti -110 rispetto a ieri).
- 10/04/2020: dati Regione Molise parziali (dato tamponi non aggiornato).
- 29/03/2020: dati Regione Emilia Romagna parziali (dato tampone non aggiornato).
- 26/03/2020: dati Regione Piemonte parziali (-50 deceduti - comunicazione tardiva)
- 18/03/2020: dati Regione Campania non pervenuti.
- 18/03/2020: dati Provincia di Parma non pervenuti.
- 17/03/2020: dati Provincia di Rimini non aggiornati
- 16/03/2020: dati P.A. Trento e Puglia non pervenuti.
- 11/03/2020: dati Regione Abruzzo non pervenuti.
- 10/03/2020: dati Regione Lombardia parziali.
- 07/03/2020: dati Brescia +300 esiti positivi
# create data for analysis
data = data.frame(date = COVID19$data,
y = COVID19$totale_casi,
dy = reldiff(COVID19$totale_casi))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 14288.15356 1732.36505 8.248 0.0000000000394 ***
## th2 0.04812 0.00256 18.794 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17240 on 54 degrees of freedom
##
## Number of iterations to convergence: 12
## Achieved convergence tolerance: 0.000006388mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 180172.9435 2389.1976 75.41 <2e-16 ***
## xmid 34.3214 0.3078 111.51 <2e-16 ***
## scal 7.5142 0.2024 37.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3657 on 53 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000003701mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
# control = nls.control(maxiter = 1000))
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 212414.913061 1904.786070 111.52 <2e-16 ***
## b2 9.051499 0.198570 45.58 <2e-16 ***
## b3 0.933149 0.000912 1023.20 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1334 on 53 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000001293richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, algorithm = "plinear",
control = nls.control(maxiter = 1000, tol = 0.1))
# algorithm is not converging...
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 223450.286127 2694.679790 82.92 <2e-16 ***
## th2 0.059520 0.001233 48.26 <2e-16 ***
## th3 6.420457 0.197730 32.47 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1340 on 53 degrees of freedom
##
## Number of iterations to convergence: 3
## Achieved convergence tolerance: 0.009097
# library(nlmrt)
# mod4 = nlxb(y ~ th1*(1 - exp(-th2*x))^th3,
# data = data, start = start, trace = TRUE)models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -624.7198 | 3 | 0.9342581 | 1255.4396 | 1255.9012 | 1261.5157 | |
| Logistic model | -537.3694 | 4 | 0.9971381 | 1082.7388 | 1083.5231 | 1090.8402 | |
| Gompertz model | -480.8797 | 4 | 0.9995649 | 969.7594 | 970.5437 | 977.8608 | *** |
| Richards model | -481.1615 | 4 | 0.9995937 | 970.3231 | 971.1074 | 978.4245 |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Infected", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 10000),
minor_breaks = seq(0, coef(mod2)[1], by = 5000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))last_plot() +
scale_y_continuous(trans = "log10", limits = c(100,NA)) +
labs(y = "Infected (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,c("fit2", "fit3")]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Infected", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 10000),
minor_breaks = seq(0, max(ylim), by = 5000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 57 2020-04-20 221905 175873 267863
## 571 2020-04-20 171774 162528 179024
## 572 2020-04-20 178250 175017 181396
## 573 2020-04-20 179401 175941 182832
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Infected", color = "Model") +
scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 10000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# create data for analysis
data = data.frame(date = COVID19$data,
y = COVID19$deceduti,
dy = reldiff(COVID19$deceduti))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 1303.061586 176.677393 7.375 0.00000000101 ***
## th2 0.054741 0.002808 19.491 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2129 on 54 degrees of freedom
##
## Number of iterations to convergence: 13
## Achieved convergence tolerance: 0.00000308mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 24084.3342 314.4080 76.60 <2e-16 ***
## xmid 36.9993 0.2737 135.18 <2e-16 ***
## scal 6.9418 0.1741 39.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 426.6 on 53 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000004877mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# manually set starting values
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
# control = nls.control(maxiter = 10000))
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 29020.9034275 248.7128516 116.68 <2e-16 ***
## b2 12.7927126 0.2953489 43.31 <2e-16 ***
## b3 0.9298470 0.0008524 1090.85 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 144.7 on 53 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000006577richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, algorithm = "port",
control = nls.control(maxiter = 1000))
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 30111.093774 326.492675 92.23 <2e-16 ***
## th2 0.065659 0.001126 58.30 <2e-16 ***
## th3 9.799704 0.299890 32.68 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 148.3 on 53 degrees of freedom
##
## Number of iterations to convergence: 3
## Achieved convergence tolerance: 0.000001097models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -507.5984 | 3 | 0.9440412 | 1021.1967 | 1021.6583 | 1027.2728 | |
| Logistic model | -417.0432 | 4 | 0.9978183 | 842.0864 | 842.8707 | 850.1878 | |
| Gompertz model | -356.5087 | 4 | 0.9997081 | 721.0174 | 721.8017 | 729.1188 | *** |
| Richards model | -357.8641 | 4 | 0.9997060 | 723.7282 | 724.5125 | 731.8296 |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Deceased", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))last_plot() +
scale_y_continuous(trans = "log10", limits = c(10,NA)) +
labs(y = "Deceased (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Deceased", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 57 2020-04-20 29517 23668 35658
## 571 2020-04-20 22806 21683 23676
## 572 2020-04-20 23701 23347 24060
## 573 2020-04-20 23805 23451 24226
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Deceased", color = "Model") +
scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# create data for analysis
data = data.frame(date = COVID19$data,
y = COVID19$dimessi_guariti,
dy = reldiff(COVID19$dimessi_guariti))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 1169.793269 110.595964 10.58 9.04e-15 ***
## th2 0.067433 0.001904 35.41 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1997 on 54 degrees of freedom
##
## Number of iterations to convergence: 11
## Achieved convergence tolerance: 0.000009108mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 62994.0392 2199.8614 28.64 <2e-16 ***
## xmid 47.4676 0.6936 68.44 <2e-16 ***
## scal 8.9369 0.2389 37.40 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 721.1 on 53 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.00000168mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 136122.96224 7653.83795 17.79 <2e-16 ***
## b2 8.57509 0.16423 52.21 <2e-16 ***
## b3 0.96364 0.00115 838.10 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 385.9 on 53 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000004477richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, # algorithm = "port",
control = nls.control(maxiter = 1000))
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 283802.345265 40917.833983 6.936 5.72e-09 ***
## th2 0.018788 0.001701 11.042 2.34e-15 ***
## th3 4.204657 0.173514 24.232 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 336.4 on 53 degrees of freedom
##
## Number of iterations to convergence: 19
## Achieved convergence tolerance: 0.000001932models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -504.0188 | 3 | 0.9849809 | 1014.0375 | 1014.4991 | 1020.1136 | |
| Logistic model | -446.4414 | 4 | 0.9979215 | 900.8829 | 901.6672 | 908.9843 | |
| Gompertz model | -411.4305 | 4 | 0.9993218 | 830.8610 | 831.6454 | 838.9625 | |
| Richards model | -403.7375 | 4 | 0.9994721 | 815.4751 | 816.2594 | 823.5765 | *** |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Recovered", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))last_plot() +
scale_y_continuous(trans = "log10", limits = c(10,NA)) +
labs(y = "Recovered (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Recovered", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 57 2020-04-20 54627 48682 60623
## 571 2020-04-20 46865 44725 48503
## 572 2020-04-20 48193 47075 49153
## 573 2020-04-20 48619 47752 49353
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Recovered", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 5000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))df = data.frame(date = COVID19$data,
positives = c(NA, diff(COVID19$totale_casi)),
swabs = c(NA, diff(COVID19$tamponi)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
# df$y = df$positives/df$swabs
df$y = df$positives/c(NA, zoo::rollmean(df$swabs, 2))
df = subset(df, swabs > 50)
# DT::datatable(df[,-4], )ggplot(df, aes(x = date)) +
geom_point(aes(y = swabs, color = "swabs"), pch = 19) +
geom_line(aes(y = swabs, color = "swabs")) +
geom_point(aes(y = positives, color = "positives"), pch = 0) +
geom_line(aes(y = positives, color = "positives")) +
labs(x = "", y = "Number of cases", color = "") +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = palette()[c(2,1)]) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))ggplot(df, aes(x = date, y = y)) +
geom_smooth(method = "loess", se = TRUE, col = "black") +
geom_point(col=palette()[4]) +
geom_line(size = 0.5, col=palette()[4]) +
labs(x = "", y = "% positives among admnistered swabs (two-day rolling mean)") +
scale_y_continuous(labels = scales::percent_format(),
breaks = seq(0, 0.5, by = 0.05)) +
coord_cartesian(ylim = c(0,max(df$y, na.rm = TRUE))) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))df = data.frame(date = COVID19$data,
hospital = c(NA, diff(COVID19$totale_ospedalizzati)),
icu = c(NA, diff(COVID19$terapia_intensiva)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1ggplot(df, aes(x = date, y = hospital)) +
geom_smooth(method = "loess", se = TRUE, col = "black") +
geom_point(col = "orange") +
geom_line(size = 0.5, col = "orange") +
labs(x = "", y = "Change hospitalized patients") +
coord_cartesian(ylim = range(df$hospital, na.rm = TRUE)) +
scale_y_continuous(minor_breaks = seq(min(df$hospital, na.rm = TRUE),
max(df$hospital, na.rm = TRUE),
by = 100)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))ggplot(df, aes(x = date, y = icu)) +
geom_smooth(method = "loess", se = TRUE, col = "black") +
geom_point(col = "red2") +
geom_line(size = 0.5, col = "red2") +
labs(x = "", y = "Change ICU patients") +
coord_cartesian(ylim = range(df$icu, na.rm = TRUE)) +
scale_y_continuous(minor_breaks = seq(min(df$icu, na.rm = TRUE),
max(df$icu, na.rm = TRUE),
by = 10)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))